Skip to content

Conversation

@YangZhou1997
Copy link
Member

Description

Please include a summary of the changes and the related issue.

Fixes # (issue)

Type of Change

  • Bug fix
  • New feature
  • Documentation update

How Has This Been Tested?

Include any tests here.

  • Unit tests
  • Integration tests
  • Manual testing

Checklist

  • My code follows the style guidelines, e.g. format.sh.
  • I have run build_and_install.sh to verify compilation.
  • I have removed redundant variables and comments.
  • I have updated the documentation.
  • I have added tests.

@YangZhou1997 YangZhou1997 changed the title config rdma [EP] config rdma Jan 6, 2026
// Add self-ranks, sub other ranks
if (thread_id < kNumRanks) {
atomicAdd_system(barrier_signal_ptrs[rank] + thread_id, FINISHED_SUM_TAG);
memory_fence();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the memory_fence() needed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perror("Failed to query device attributes");
exit(1);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and
attr.max_dest_rd_atomic = dev_attr.max_qp_init_rd_atom;
It gets really slow on AMD platform.

}
#if defined(__HIP_PLATFORM_AMD__) || defined(__HIPCC__)
memory_fence();
#endif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Base automatically changed from ep-debug-amd-mem-consistency to main January 18, 2026 01:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants